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Abstract 


Full account of the latent regression model for the National Assessment of Educational Progress 
is given. The treatment includes derivation of the EM algorithm, Newton-Raphson method, and 
the asymptotic standard errors. The paper also features the use of the adaptive Gauss-Hermite 
numerical integration method as a basic tool to evaluate expectations necessary to perform the 
parameter estimations. 

Key words: Latent regression, EM algorithm, Newton-Raphson algorithm, Gauss-Hermite 
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1 Introduction 


Educational assessments such as the National Assessment of Educational Progress (NAEP) 
and Trends in International Mathematics and Science Studies contain broad content frameworks 
and consequently large item pools. Yet these assessments are governed by restrictions on the time 
and, subsequently, the number of items each student is requested to respond to. An efficient way 
to address this dilemma is to systematically distribute items among students in a so-called matrix 
design. In a matrix design, a given student is only administered a certain randomly assigned 
subset of items. 

Not-so-pleasant implications of this matrix design are that only limited information is 
available about a specific individual and that comparisons between students are complicated. 

To further enhance the ability to produce the required set of estimates, this paper investigates 
a method to address these issues in which a large collection of background variables is created 
and responses corresponding to these variables are collected via several survey instruments. A 
regression of student’s ability on these background variables is then incorporated into the existing 
marginal item response theory (IRT) model to create the main object of interest in this paper, the 
latent regression item response theory model. When only population subgroup relevant information 
is the concern of the assessment, as is the case with NAEP, this approach has been found to be 
satisfactory and became the workhorse of this sort of large scale assessments (Mislevy, 1984, 1985). 

The structure of the paper is as follows. Some preliminary notes on mathematical notations 
are followed by the detailed account of the latent regression item response theory model. The next 
three sections present the usual way of parameter estimation via maximum likelihood method 
coupled with an application of the successive approximation technique (EM algorithm). Special 
emphasis to the adaptive Gauss-Hermite numerical integration method. There is a section about 
the common population IRT model (mainly for comparison purposes) followed by a section about 
the detailed account of the computation of asymptotic standard errors. Concrete derivations of 
the Newton method is given along with a discussion about its implementability in the latent 
regression framework complete the paper. 

2 Notes on Notations 

The definitions and notations introduced in this section are fairly standard in linear algebra. 
Nevertheless, they are included here in an effort to make the paper self-contained. The interested 
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reader may find Halrnos (1974), Harville (1997), and Magnus and Neudecker (1999) useful for a 
further study in linear algebra. 

Tensor products of vectors are defined as 

<g> A ' : M ni x • • • x M nK -»• M ni <8> • • • <8) , 

(ai,..., ax) 1 —> cii <8> • • • <8> ax, 

where 

(o-l 0'I\)i\...iK — (®l)ii • • • 

As an example, observe the map, 

R n <8> M n —> M n (R), (a <8> b)ij = afij. 

A general tensor, by definition, is a linear combination of elements of the form a\ <8> ■ ■ ■ <8> ax. The 
extension of the tensor product to matrices and higher order tensors follow the same idea (e.g., 
(A <8> B)ijki = AijBki, where A and B are ordinary square matrices). As a shorthand for the n-fold 
tensor product, 

a® n = a ® ® a. 

The symmetric tensor product of two tensors is defined as 

A® S B = l(A<g> B + B <g> A). 

The scalar product is a bilinear map, 

n 

( | ) : r X 1" ^ R, (a, b) ^ (a \ b) := 

i —1 

It naturally extends to any tensor product, 


( | ) : M ni <g> • • • <g> R nK x M ni <8) • • • <g> R nK -*• R, 

( A,B ) i—► {A | B) := ^ Ai 1 ,„i K Bi 1 „,i K . 

h,—,iK 

It will be convenient for the purposes of this paper to extend the scalar product for a specified set 
of indices of a tensor. The result is going to be a tensor of type determined by the unused indices. 
That is, 

< I >/ .../ : K ni <8> • • • <8 M nif R^, 
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(A,B) (A | B) :i Ir Ji Jr :— 

^1 ,... yi-p 

where the summation indices of A appear at the positions I\,... ,I r and the that of B appear 

u k_ y{ l _ mn 

at the positions Ji,..., J r . Moreover, N = — z ~ 1 ni J n ^ . An example may illuminate the idea 
better: 

(A | -B) 12,14 — ^ ' A,jB t] , that is ((A | B} 3 2 \±)kimn — ^ ' Aijj~iBi mn j. 

i,j i,3 

Even the usual matrix multiplication can be expressed in this way: 

Aa= (A | a). 2 l = ^ A^m eT, A E M n (K), a 6 M n , 

i 

AB = (A | £> 2)1 = J2 A -iBi. € M n (R), A, B £ M n (R). 

i 

The quadratic form is defined as, 

n 

(||): M n x M n (M) x M n —> M, (a, A, b) i—> (o | A | b) := diAijbj , 

*>. 7=1 

which can also be written as a scalar product of two tensors: 

(a | A | 6) = (a 0 b \ A). 


3 Likelihood of Latent Regression 

The marginal latent regression item response theory (MLR-IRT) model is given by the 
log-likelihood (Mislevy, 1984, 1985; von Davier, Sinharay, Oranje, & Beaton, 2007), 


where 



J 

Li{0) '■= P(yi I e,B) = HP 3pl ’ pc (y ij \e,f3 j ) 

3 = 1 


(1) 

( 2 ) 


is the likelihood of the response vector y* of student i given the ability 8 E (where K is 


the number of dimensions or subscales) and item parameters B = {(5 \,... ,/3j). The number of 


response categories mj for item j splits the item pool into two disjoint subsets: dichotomous 
items {mj = 2), and polytomous items {mj > 2). For dichotomous items / 3j = {aj,bj,Cj ) and 
for polytomous ones /3j = {aj, bji,bj mj ). The design vector q 3 E {0,1} A for item j is also 
introduced so that q 3 ,k = 1 if item j is associated with subscale fc; otherwise q 3 ,k = 0 . 
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The probability of the actual response of student i to item j as a function of 9 E is given 


by 

y, if Vij = 1 , 

\ (3) 

«„■ I «>->,) ) ’ lf ^ = o, 

for dichotomous items and by 


E I d )~ b jv) 

P pc (Vij I Mj) = —-—it -, if Vij = h £ { 1 ,. • (4) 

^ E <7«© I *>-&*») 

2^ e™ =1 

U= 1 

for polytomous items (Birnbaum, 1968; Muraki, 1992). 

The population distribution is multivariate normal 


P 3p \yij I e,Pj) = { 


(l-Cj) 


Cj + 

1 + e ~ a 3"' 1 i 

(1 -Cj) (l- 


1 +e~ a i 


<p(d\TXi,Y) 


_ 1 p -\(e-Txj I E - 1 I e-Txi) 

(27r) fc / 2 y / det(E) 


(5) 


Here, Xi £ M 1, is the vector of background variables, T £ Mk,l( K) is the matrix of regression 
coefficients, while E £ is the covariance matrix of the subscales. Note, that while E is 

common across the population, the mean Tx* is governed by the background variables and can be 
different for each student. Also introduced is 


A f(L.i)= f P{ Vi | 0)^(0; rx;,E)d A '0, (6) 

for the normalization of the likelihood Lj. 


4 Maximum Likelihood Solution for Latent Regression 

Useful references for the calculus involved in the rest of the paper may be Harville (1997) and 
Magnus and Neudecker (1999). The goal is to maximize the log-likelihood (1) with respect to T 
and E assuming that item parameters are known. To this end, first compute the derivative of L 
with respect to the matrix element 7 ^ of T: 


dL 

1 N 

P(Vi 1 0) 

f d(9-T Xi 

E” 1 

dlki 

2 ^ 

i= 1 

M(L,) . 

JR K 

ki 


1 N 

--V 

f e(o- 

- Txi | E _1 9 - 

- Txj) 


2 ^ 

i= 1 

Jr k 

d'Yki 



— Txi 


'-<p(0-,T Xi , E)d A 'fl 


d m, 


(7) 
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where //,; is the probability measure for student i: d m = y?(0; Ta^, X)d A $. Now, 

-^-(e-vxi Is- 1 |e-rxi) 
dlki x " 

(k) 

= -((0,..., Xu ,..., 0) | XT 1 | 0 - r^j) 

(k) 

-(0 - Txi | S" 1 | (0,, xu, 0)) 

= x i i(T,~ 1 Tx i ) k + (r^i ■ X _1 ) fe a^ - x i i{Tj~ 1 0) k - (0 ■ X -1 )*.^. (8) 


(k) 

Here x means that x appears at the kth. position, a ■ B denotes the usual right action of matrix 
B on the vector a: 

(o. • B'jk — ''y \ a r B r k- 

r 

Using (8), the equation = 0 can be written as 

N N 

y~] xui^Tx^k + (r Xi ■ T,~ 1 ) k xu = ^ + (0i ■ S _1 ) fc xa (9) 

2=1 i= 1 

which, using that X is symmetric, becomes 

N N 


^X iZ (X l TXi) k = ^Xj;(X 1 ( 


i)k* 


i=1 i=1 

Here 0 t is the expectation of 0 for student i: 

0i-=f 0d m. 


Multiplying with X yields 


This is then rewritten as 


N 


N 


^2 x u{Txi) k = 'ExuO* 


i=1 


2—1 


N 


N 


( ^2 xi <g) xi | r t ) 21 = y] xi ® 

2=1 2=1 

Therefore, after multiplying (13) by x i ® > the result is 


N 


-1 


r HE 


Xi 09 Xi 


r N N 

^Xi® 0i 


. 2 = 1 


. 2=1 


( 10 ) 


( 11 ) 


( 12 ) 


(13) 


(14) 


Note that this is an implicit equation, since for 0j t the knowledge of T and X would be required. 
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The derivative of L with respect to the matrix element a k k' of S is 

dL f ddet(X!) - 3 1 

= ^.L^^ det<s)5dft 

1 * r die - Txi I S ' 1 I 9 - Yxi) 


To proceed, use the following computations. 

det(S)l gd ^ t(S) 2 = -Idet(E)~ 1 (2 - <W)&k' = -J(2 - S kk ^„ 

0(7 kk' 2 2 

where 5 kk i = 1 if k = k'\ otherwise 8 k k' = 0. Also, £ denotes the adjoint matrix of S. 
The derivative of the quadratic form is computed using 
9£ -1 1 

q g = 2^ kk ' ~ 2 ) s_1 ( e fe® e k' + e k > ® efc)S _1 = {5 k k' ~ 2)(S~ 1 ® s S' 1 )*,*,/, 


where for e k £ , efc K = and 


Then, 


(S 1 (8) s S = -S <g) e fc / + e k > 0 e fe )S x . 


0(0 -Txi S' 1 0-Txi) 
dokk 1 

= (5 kk f -2)(9 - Txi | (S' 1 <g> s S _1 )fcfc' | 6-Txi] 


Using 9 - Txi = (9 - 9i ) + (0* - Txj), 


1 / * 

x(2 — 8 kk >) ( ^ S fcfc/ 


That is 


J2 £ ( 6 - ~°i I ( S_1 S' 1 )^ | 0 - 0<) 

2=1 

N 

^2(9i- Txi I (E- 1 ® s s- 1 )^/ I 0 ? :-rxi] 


( 2 -X)- 1 ^ = --Vs - 1 

v ' <9S 2^ 


+ - (S- 1 <g> s S' 1 | sue- 0i) 0(9-9. 


+ - ^ (s 1 <g> s S 1 | (0j - Txj) ® (0j - Txj 


6 



Setting = 0 and multiplying it with S <g> s E, the result is 

1 N 

S = - ^ - Txi) ® (0i - Txi) + £ ((0 - 0i) ®(6- 0*))] . (21) 

i= 1 

Here is introduced 

S (i) := f ((0-0i)® (0-0i)) 

= [ (e - £<) ® (0 - 0i) ^(0; rxi, E)d*0. (22) 

Jr* A/ (Lj) 


5 Computing Expectations 

To find the maximum place of the likelihood of the latent regression, several integrals need to 
be computed: one for the normalization constant of individual’s likelihood 


A f(Li) = f P( Vi | 0)v(9;rxi,Z)d K e, 

(23) 

Jr k 


another for the expectation of the latent trait 


0i'-= [ 0 T777T^; r *i,£)d*0, 

Jrk A (Li) 

(24) 


and one for the expectation of the variance of the latent trait 



The computations in NAEP are carried out using numerical quadrature for K < 3 and Laplace 
approximation for K > 2. 

The detailed account of adaptive numerical integration is given in (Antal & Oranje, 2007), 
here we only provide a short description of the major tools. In normal population marginal IRT 
the implemented numerical integration procedures aim to compute multidimensional integrals of 
the form 

I:= [ f( x )e-< x \ x) d K x, (26) 

Jr k 

where / is a smooth function. The rectangle rule approximates the integral by 

/= E f(Q)e~ {qlq) Aq, (27) 

q&QV 
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where QV = {qi,... ,qq} R with q. 


Q.min 


/ \ K 

I Qmax Qmin (A 1 ^ (A _ 1 /AN A n _ I Qmax Qmin \ 

Q -1 \ L • • M)- — y q_ 1 j . 


An example would be [—4,4] or [—5,5] with Q = 41. 

Gaussian quadrature (Stoer & Bulirsch, 2002, pp. 171-180) provides a tool that makes it 
possible to compute higher dimensional integrals without compromising computational precision. 
With the i?th Gauss-Hermite quadrature, the integral is approximated as 


1 = Y 

i£QPg Hr 


(28) 


where QVgh r is the set of the zeros of the Rth. Hermite polynomial Hji and QVq Hr is the J\th 
Cartesian power of QVgh r ■ The weights are given by w q = w qi w q2 ... w qK , where 

2 R ~ 1 R\^n 

Wqi ~ Wh^W 2 ' 

If the number of items is relatively large, it is possible that the response likelihood P(yi \ 0, (5) 
has a sharp peak at a location depending on the item parameters (3 and the item responses y t . It 
is then possible that an integration technique based on finite number of function evaluations fails 
to sufficiently capture the behavior of the response likelihood. While this is very uncommon in 
NAEP, where the number of items per subscale rarely exceeds 10, this issue is addressed here for 
the sake of completeness. 1 In addition, it is expected that a method more cognizant of the actual 
behavior of the response likelihood may be computationally more efficient even for tamer response 
likelihoods. 

One way of taking the peak of the response likelihood into consideration finds the modal 
multivariate normal approximation 



p( Vi i M) = ^(M”\sr), 


(30) 


where Of 1 is the mode of P(jji \ 0,(3) and E” 1 is the modal covariance matrix of P(yi \ 0,(3). More 
precisely, 0™ is obtained as the solution of 

dP{yi | 0,(3) 


80 


= 0 , { 0 = 1 ), 


(31) 


and the modal covariance is defined by 


sr = - 


<9 2 log Pfa | 0,(3) 
80 2 


-l 


e=ev 


(32) 



For an arbitrary smooth function g(9), proceed with the integration as follows: 


£{g)i = 


where 


and 


.9(0) P ^^\ (e;Txi,E)d K e 
9(0) C, e?>( 0; r*i, X)d*6> 


9(0) 


Af(u)ip(e,er^T 

P(9i I 0,(3) 


-CM0;0f,^) d K e, 


r* Af(Li)(p(9; 9™, X™ 


sf = (s- i + (srr 1 ) _1 , 
n = sf(s- 1 rx i + (sn- 1 0D, 


Ci = 




(2vr)^/visrpr 

Then, one finds the Cholesky decomposition T,Tj = 2X;? and performs the change of variables 


(33) 

(34) 

(35) 


z = i;- 1 (0-0f), 9 = T IZ + 0V 


(36) 


to obtain the Gauss-Hermite rule 


£(g)t = a 


E 

-? 6 S vz h 


g(TiQ + 9 f) 


Af(L^(T ig + 0f,0™ X£ 




(37) 


When the approximation (30) is good, then the function is approximately constant in 

the range where the normal integration weight ip(0; 9\, X^) is not negligible. 

Because this computation uses additional information about the integrand (i.e., the method 
adapts itself to the integrand), the technique is sometimes referred to as adaptive numerical 
integration. 


6 EM Algorithm 

The previous two sections presented five computational steps towards maximizing the 
likelihood (1) in terms of the latent regression coefficients T and X—(14), (21), and three 
expectations (23, (24), and (25). All of these formulae are implicit though, as they depend 
on parameters to be estimated. One way of solving this system of integral equations is to 
construct an iterative scheme out of the the five building blocks. This is usually termed as an 
expectation-maximization (EM) algorithm; in this case, however, the maximization steps (14) and 
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(21) are trivial. This particular solution can be thought of as a successive approximation scheme. 
The parameters gain an extra index to indicate the iteration status. That is, the initial value for T 
is T®. By choosing T^ = 0 and E ^ = IcIk it is possible to compute the expectations as follows 

M(Li)W=( p( yi | e) v {e-Y^ Xi ^)d K e, (38) 

0 r( 0) x„S(°))d^, (39) 

J R K a/{L iP v > 

and 

Var (0i) (O) = [ (9- ~ef ] ) ®{0- 0< o) ) T^ Xi , X^)d K 9. (40) 

Jrk M(Li)W) 

Using these initial estimates, it is now possible to obtain updates for T and E via the so-called 
M-steps (14) and (21). 

The iteration stops when the prescribed convergence threshold is reached. 


7 Common Population Estimation 

The common population approach aims to maximize the log-likelihood (1) with respect to 
the population mean and and population covariance without the regression effect. That is, one 
assumes that the log likelihood is of the form 


N r 

L = T log/ P{ Vi | 6M0;^)d K 6. 

tt J ^ K 


(41) 


For the derivatives of L with respect to the population parameters p and E, 


dL 

dp 


N 

E 


1 


1 

2 ^ A f{Li) J r k 

N 


P{Vi I 0) 


-^Ie - 1 \ e- i 

dp 


-<p(6\p, T,)d h 9 


1 


EtTTTT [ P ^ I 0)^- 1 (9-pM9;p,^)d K 9. 

“(A I (Li) J r k 


(42) 


Setting jP = 0, the implicit equation for the estimate of p is: 


M = 


N 


N 

E 

i=1 


Q P(.Vi I * ^\jKn 

9 Af(Li) 


(43) 


The computation for the derivative with respect to the population covariance is almost identical to 
the corresponding computation performed for the M-step. It yields the following implicit equation 
for the covariance estimate 


1 


N 

E = vE 

i= 1 * 


(9 - p) ® (9 - £, S)d A '0. 


(44) 
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8 Standard Error 


Assume asymptotic consistency in that the estimates follow a normal distribution around the 
true value, with covariance being the inverse of the information matrix of the likelihood, so that 


I(L) = 


( d 2 L d 2 L \ 
W 1 drdT, 


. d 2 L d 2 L I 

\ asar as 2 / 


Cov(f, E) = I(L) 


-i 


(45) 


Then compute the standard error. Recall that the first derivative of L with respect to T is given 

by 


dL 

d^ki 


Then, 


N 

E 

i= 1 


d 2 L 


P(yj\Q ) 

N{Li 


xup-'iTxi - 0)) fc ^;Tx i ,E)d A 0. 


K, 


d^k'i'd^ki 
N „ 


1 dAf(Li) 


AT (Li) d^k'v 

N - a(E -\Txi-e)) k 


xu(Y, \Txi - 6)) k dm 


E 

i =1 
N 

E 

i =1 


x u 


dlk'i 


dm 


P(yj\e ) 

A [(1, 


xu(T, 1 (Tx i -6))k 


d(p(e-,Txi,T,) K 


dlk'i 


d lx e. 


Performing the prescribed differentiations yields 

N K 


d 2 L 


dlk'i'dlki 


Finally, 


d' 2 L 


dlk'i'dlki 


^ ^ y ^ ^il^il'^kK, ^k'K,' ^i)k' 

i —1 k,k'=1 
N 

— ^2 xuXii'Y, kk , 
i =1 

N K 

-E E x il x il'^kK P‘k'n'^' ((b^i ^)fc(r X{ @) K ')- 

i= 1 k,k'=1 


N 

y2 X i ® x i® 'L^ l )u'kk' 

i =1 

N K 

yz yz ( x i® x i®P‘ ( ' i) ®P‘~ 1 ®P‘- 1 )wnK'kKk'H 

i —1 K,K f = 1 


(46) 


(47) 


(48) 
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a2 t 

The terms are computed as follows: 

d 2 L 


d(7k'k"dlkl 


N 

E 


i av(Li) 


N(Li) da k > k " 

5(s _1 (rxi — 9)) k 


xu(E (Txi-6))kdfj,i 


N 

E 

2=1 

N 

E 


xu- 


do k 'k" 


d m 


First, 


~^Jr k Af(Li) 


1 dAf(Li) 


p(v ‘ m ^(V-Hrx, - »)) t 9 rt* r *’ s W 


3a 


fc'fc" 


that is 


AT (Li) da k i k n 


N 


P(y.i 10)a^;rxi,E) d ^ 


Af(Li 


da 


k'k" 


(A< * V- 5^ f P(yi\e)d<p(e-,Txi,i:) AKa 

(49) = E-( E < r « - o 0)* ^ d *• 

Then, the contribution of (50) is computed: 


1 


Af 


(50) = — ^ xu(Sk'k" - 2 ) ((S 1 < 8 ) s s 1 )fc'fc"(Txj - 0j)) ; 

2=1 


Furthermore, 


Af 


< 51 ) = -E 


AT (Li) 


Pl - Km x il{ z-\r Xi -S,)) t ‘ 


da, 


k’k" 


N 

E 

2=1 


W) 

AA(L, 


® <{ (S _1 (0-^))fc 


d<p(0; Txi, E) 


da k ' k " 


d n 6, 


where the first summand in (55) cancels out the contribution of (53). 
From previous computations, 


d<p(Q‘,Txj,Y) 

da k ' k " 

= --(2 - 4'fe")Efe4»^(6 , ;rx i , E) 

+ ^(2 - 4 >k"){Lxi - 9i | (S' 1 ® s E _1 ) fc / fc // | Txj - 0i)<p(6-,Txi, E) 
+ ^(2 - 5 k 'k")(0 - 6i | (E- 1 E -^k" | 6 - 6i)<p(0; Tx h E) 

-(2 - S k ' k ")(0 - Qi | (E -1 <S) s P‘ 1 )k l k" | 4 - rxi}(^(0; Txi, E). 


(49) 

(50) 

(51) 

(52) 

(53) 

(54) 


(55) 


(56) 
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Finally, 


d 2 L 


da k 'k"d^ki 

1 

2 

N 


( N 

(fa'k" - 2 ) f ~^2 Xil (A^ 1 s 1 )fc , fc"(r^i - 0i]j t 


2—1 


- V [ xup-^e - 9i)) k ({0 - Qi) ® (0 - 0*) I (S- 1 S- 1 


+2 V / a^S" 1 ^ - 0i))*((0 - 9i) ® (rsi - 6i) I (S- 1 ® s E-^fc/fc/^d/ii ]. (57) 

i=i - 7mX / 

a 2 r 

The terms —- are computed using (19) in several steps as follows. First, recall the 


derivatives for E 1 are 


For (E 1 <g) s E *), they are 


as** 


= -( 2 -<W)(£ 


^ ^KHi'kk' • 


(58) 


<9(S ®s S )kk'rr' 
dcT tin 1 

Also, for the normalizations J\f(Li) 


= -(2 - (5 k/6 /)(E 


^ ^ ) KK'kk'rr' • 


i aAT(Lj) 

(-^i) kk! 

= — 9(2 — <W)E k/< . ; — 2 [ 

1 * Jr k 

= ~ 2^ 2 _ ^ k ') S kk' + 2^ 2 ~ 


- Txi I E- 1 I 6» — Ts. 


da K 


-d m 


(E 1 <S> s E 1 ) KK ' 


(59) 


(60) 


and 9; 


dOj 

da K , 


? 5y)(6>;rxi,E) 1 

dcr K K' (p(9;Fxi,T,) 
dif(9;Txi, E) 1 


dm - 9 


1 5A7(Li) 


(9 ~ 9i ) 


(0 - 9i) 


da KK > ip(9',Txi, E) 

- rx* | E” 1 | 9 - Tx 


R x AT (Li) da K 

dm 


dm 


do K 


-dm 


- 2 (2 5k 


-^)® 3 ) |(E“ 1 ® s S- 1 ), 


'23 


-ra*) (E -1 ® s E -1 ) 


'23 


(61) 


13 



Then. 


This, using 


yields 


d£((d - 6 t ) ® (0 - 9 t )) 

d(J KK ; 


{6 - 6i)dni 


- [ («- ft) ® <« - ft) J, 

Je^ A/(T*) acw 

f , a 2\ d( P^ Tx ^ E ) 1 j.. 

+ / (0 O l ) ® (6 Or) d/i*. 

Je^ Wkk' ^(fc^Ta^E) 


<9<t kk / 


- Oi) = (0 - + Oi- 


(0(7 KK f 


-Oi), 


dsao-Or)®^-^)) 


da KK > 


= + ( (ifi- Oi)® 2 - £ W ) 


= £(*) - 


(0 - 0 i )® 2 - £« 


dip(0;Fxi, E) 1 d 
3cr KK / ^(fljTa^E) ^ 
\ a(6» - rx. I s- 1 1 0 - r Xj) 


= E« + -(2 - <W) (S ((0 - 0)f 4 ) - E (i) ® 2 | (E' 1 ® s S _1 ) KK '»j 
+(£ ((o - 0)f 3 ) ® (0i - Txi) | (s- 1 e- 1 )^) 34 


0 - 0)f 2 ® (0i - Txi) ®{0- 


Finally, putting all the above together yields 

d 2 L 

da 

K.K,' dakk' 

N _i _i 

— ^-(2 $kk ')(2 5 kk i)(Yj E )kk'KK' 


®s ^ ) kk ' ^4 


+9(2 _ <w) y~i 


,ag((g-gj)®(g-gj)) 

d(J k.k/ 




-(2 - 4fe')(2 - <W) « gW I ( S_1 S_1 S- 1 )^ 


+(2 - <W) ^ 


- Txj <g> s (6>j - Txj) | (£ 1 <g> s £ 1 


-(2 - 4 fc ')(2 - < W ) ($ - r ®*)® 2 | ( S ’ 1 E - 1 ® s E " 1 )^ 
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9 Newton Method 


9.1 Basic Computations 

This section derives the formulae for the estimation based on Newton’s method. First define 

F : Mx,z,(M) x Mjr(M) — > Mr-^R) x Mr-(R), F = (-Fiji's), 

Fr : M^iCR) x Mr(R) —> Mr-^R), 

Fs : Mr^R) x Mr(R) —> Mr-(R), 


where 


/ AT n 

F r (r, E) = r* - | 37 ( 8 > 37 

\*=i j 


-l 


N 




^ i=l 


l 


AT 


F s (T,S) = [(«< - FxO ® (0i - TsO + E (i) ' 

1=1 


By construction, the equation 


f(t,e) = 0, ((t,e) =?) 


(64) 

(65) 

( 66 ) 


is equivalent to the system of equations presented in (14) and in (21). The Newton method for 
( 66 ) yields the following scheme for the iterative estimates (T( n ),E^): 


(p(n+i)^(n+i)) _ ( r (") ; x;( n )) _ DF[r( n ),s( n )] _1 F(r( n ), e^). 


(67) 


The computation of the derivative 


DF = 


/ dFr dFr \ 
Or 0E 


, dFs dFn I 
\ c)V dT. ) 


( 68 ) 


is as follows. First, 

dFr _ dC 

djki 

dC 

<97 ki 

dF r 
d(7kk’ 


N 


-1 


\i =1 
< N 

E= 


i Xi 


f N 

E 


ddi 


Xi 


K 1=1 


\i=1 

' N \ ~ X N K 

Xi®Xi 

\i=1 / i =1 k=1 


dlki) 

EE*®< 37® E” 1 ®!^) 


IkhiK, • 


Then, 


' N 

E 

v*=l 


-1 


Xi <x> Xi 


' N 

E 

\i=l 


Xj 


dOj 

dcrkk' 


(69) 
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The term is computed in (61). Next, 


dFz 

#7 kl 


N 


N 

2 

'N 


E 

i=l 

N 

E 

i =1 


do, ar ~ „ . // dft 

Xi ® s (ft - Txj) + £ 0 - 


<97 m 07fcz 

v 

' aft ar 


a7w 


(^ - ft) 


o o (ft - rxi) + ^e w 

o^ki o^ki / 2 


Finally, 


(70) 


aF s 

d(Tkk' 


aE 2 

ao- fcfc / tv 


AT 

E 

i= 1 L 


^-rx, ®,(«.-rx i) + iE«) 


(71) 


9.2 Notes on Implementation 

From the actual forms of the second derivatives—most notably (63)—it is clear that direct 
application should be done very carefully. The number of function evaluations in any numerical 
integration scheme with N qp quadrature points in N^im dimensions ( N c i irn = number of subscales) 
for the expectation 

-«)?*) 

is 

Ni N Ndim 

which is 5 4 • 12 5 = 155,520,000 for a reasonable choice of five dimensions and 12 quadrature 
points. Now, there is a relatively high degree of symmetry in the computation that can be used 
to decrease this number significantly. The point is that a careful analysis should be performed to 
find the best way to use this symmetry and to determine if it brings down the number of function 
evaluations to a manageable range. 

With the Newton method, the computation might not worth the effort, since there already is 
a well-functioning (even if slow) estimation method (the EM-algorithm). For the standard error 
(where the exact same computations are needed to be performed), the effort is justified only if the 
added correction may significantly alter the existing approximate estimator. 

In both cases, the discussions are postponed for a future research study. 


E ((0 


10 Conclusion 

This rather technical paper presented the fundamental computations underlying the marginal 
maximum likelihood estimation of the latent regression model as used in NAEP. Special emphasis 
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was given to the numerical integration method, which, while an artifact of the calculation, can play 
a decisive role in the manageability of the computations required for the parameter estimation. 

While the computations of the formulae for the asymptotic standard error and that of the 
second derivatives of the likelihood appearing in the Newton method pose no theoretical problems, 
practical implementations may encounter serious difficulties. This is mainly due to the appearance 
of the expectations of the third and fourth order tensors (9 — 9)® 3 and (9 — 0) , since the 

integration should be performed for each matrix element separately. While the existing symmetry 
may be used to reduce the computational burden, a practically useful way is yet to be found. 

Another direction for future study could be the comparison of the theoretical standard error 
formulae to both simulation study empirical standard error (mainly to check appropriateness of 
asymptotic approximation) and to the existing NAEP standard error (to ensure consistency of 
practice with theory). 
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Notes 


1 Note, that, in general, the more items, the sharper is the peak. 
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